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til INTRODUCTION 


In 2009, the Australian Bureau of Statistics (ABS) was asked by the Council of Australian 
Governments (COAG) to look for ways of improving the accuracy of estimates from the 


Survey of Education and Work (SEW). The issue of particular concern was the ability to 
measure annual change in performance measures/indicators, especially in the smaller 
jurisdictions. The SEW is one of the major data sources used for performance measures in 
the National Education Agreement (NEA) and National Agreement on Skills and Workforce 
Development (NASWD). SEW is run as a supplement to the monthly Labour Force Survey 
(LFS) in May each year with a sample size of about 40,000 persons in 2010. 


The ABS has investigated combining data collected from more than one survey, known as 
data pooling, to improve the accuracy of estimates. Data pooling was seen as potentially a 
cost effective way of increasing the sample size since it was based on making further use of 
data already collected rather than embarking on additional collection, with its consequent 
impact on cost and provider load. 


This report presents the results of data pooling of the SEW over two and four years between 
2007 and 2010. That is, of doubling and quadrupling the sample over the respective time 
periods. It examines the impact of data pooling on the reliability of selected performance 
measures and the capacity of the pooled data to detect statistically significant change over 
time. 


Combining data from different surveys to improve estimates may have applicability in other 
situations, and as such, this report presents detailed technical material on the rationale and 
methodology chosen for data pooling in this instance. 


Overview 


OVERVIEW 


This section introduces the performance measures and briefly describes trends over the 
past 10 years in order to set the context for analysis. The effect of data pooling on the level 
and reliability of the indicators is then discussed. To assess the potential benefit of data 
pooling for improving the measurement of short-term change, four comparisons are made: 


= single-year data (that is, the usual Survey of Education and Work (SEW) estimates in 
the given years) over the one-year period 2008 to 2009 

= single-year data 2009 to 2010 

= single-year data over the two-year period 2008 to 2010 

= pooled data 2008 to 2010. 


THE INDICATORS 


There are four COAG measures that are based on the Survey of Education and Work 
(SEW). In addition to estimates by state and territory, the measures are further 
disaggregated by socioeconomic status. While Aboriginal and Torres Strait Islander status is 
also a key priority for data development, related performance measures on education and 
training for Indigenous people are sourced from Indigenous-specific surveys and therefore 
the topic is outside the scope of this study. 


=» NEA 7 — The proportion of the 20-24 year old population having attained at least a 
year 12 or equivalent or Australian Qualification Framework (AQF) Certificate Level II 
or above. 

=» NEA 9 — The proportion of young people aged 15-19 years fully engaged in post 


school education or training. [1,2] 

=» NEA 10 — The proportion of 18 to 24 year olds fully engaged in employment, education 
or training at or above Certificate III. [2] 

=» NASWD 2 — The proportion of 20-64 year olds who do not have qualifications at or 
above Certificate III level. 


These four indicators comprise two measures of attainment (NEA 7 and NASWD 2) and two 
of engagement (NEA 9 and NEA 10). Three of the measures would be expected to rise with 
increased participation in education, training or employment, while one indicator (people 
without qualifications - NASWD 2) measures the population at risk and would be expected 
to fall as a result of policies to foster greater participation in education and training. 


TRENDS BETWEEN 2001 AND 2010 


To set the context for examining change over the short term, this section briefly reviews 
trends in the indicators between 2001 and 2010. Over the past 10 years, there has been an 
upward trend in NEA 7 (Year 12/Certificate II, 20-24 years) at the national level, where this 
measure rose by about 6.5 percentage points from 79.1% in 2001 to 85.6% in 2010. Trends 
at the state/territory level are less clear from the survey data. There has been a certain 
element of year to year fluctuation, especially in the jurisdictions with smaller populations, 
most notably the NT. This could well reflect sampling variability and suggests that caution 
should be applied when interpreting change in NEA 7 using standard estimates from the 
SEW, particularly from one year to the next (See the section on significance testing for 
further explanation of sampling variability and measures of change). 


The other attainment measure, NASWD 2 (low qualifications, 20-64 years) is based ona 
much larger population (a 45 year age range) than the NEA indicators (based on 5 to 7 year 
age ranges only) and as such has a larger sample and more reliable estimates for all 
jurisdictions. It has also changed more than NEA 7, falling by 12.6 percentage points from 
58.0% in 2001 to 45.4% in 2010. The downward trend over the past 10 years in this ‘at risk’ 
indicator is reasonably clear for all jurisdictions. 


The trend in engagement: NEA 9 (engagement, 15-19 years) and NEA 10 (engagement, 
18-24 years) was less clear and appeared to differ among jurisdictions. For both measures, 
the overall level of full engagement in employment or education/training was much the same 
in 2010 as it was in 2001 but was subject to some degree of fluctuation over the intervening 
period. At the national level, the rate of full engagement among 15-19 year olds measured 
by NEA 9 was 72.5% in 2001 and 69.8% in 2010. The corresponding rates of full 
engagement for 18-24 year olds from NEA 10 were 71.4% and 72.6%. These measures 
incorporate participation in work or study, both of which are influenced by the economic 
cycle. Like NEA 7, the engagement measures are based on a small sub-population (5 to 7 
year age ranges). Therefore, the pattern over the past 10 years was influenced both by 
variation in the actual level of engagement by young people and to sampling variability. 
Once again, caution should be exercised when interpreting change in these indicators over 
the short term. 


Time-series data for the indicators can be found in datacubes associated with Education and 
Work, Australia (cat. no. 6227.0) 
THE POOLED DATASETS 


In order to examine trends over the short term and the potential benefits of data pooling, 
data were combined from successive cycles of the annual SEW. Two-year data represent 


pooling over 2007 to 2008 and 2009 to 2010. Four year data represent pooling over the 
entire period 2007 to 2010. 


The pooling methodology used in this study involved benchmarking the initial weights in the 
combined or pooled dataset to required totals based on geographic, demographic and 
labour force characteristics of the population at the end point of the pooling period. As such, 
the reference point for the pooled estimates is that of the latest survey. The samples for 
earlier years in the combined datasets can be regarded as ‘lagging’ samples. For instance, 
when pooling the 2007 and 2008 surveys, the pooled estimates are for the 2008 period and 
the sample for 2007 is assumed to be a lagging sample for 2008 (See methodology section 
for further explanation). 


Accordingly, single and two-year pooled estimates are available for 2008 and single, two- 
year pooled and four-year pooled estimates for 2010. Since the pooled data originates from 
time periods prior to and including the reference year, the pooled educational qualification 
estimates may be affected by structural changes in the population and changes in labour 
force participation during the full period of the pooled data and so are expected to be 
‘lagging’ behind the individual yearly survey estimates for each time point. 


In the time horizon of this study from 2007 to 2010, one time comparison: 2008 to 2010 can 
be made to assess change using two-year pooled estimates. Since only one set of four-year 
pooled estimates could be calculated, no time comparison is available from four-year pooled 
data. 


The gains in the accuracy of estimates from data pooling over two years are effectively 
equivalent to those from doubling the size of the original survey sample, and from four-year 
pooling to conducting a survey with a sample four times that of the original. For SEW, the 
single-year sample in 2010 was about 40,000 persons, the two-year pooled sample 
2009-2010 about 70,000 persons and the four-year pooled sample about 154,000 persons. 


The size of the sample for the Labour Force Survey (LFS) and, as a consequence, 
associated supplementary surveys such as SEW was reduced in 2009 before being 
reinstated in 2010. Therefore the pooled dataset over 2009 to 2010 (about 70,000 persons) 
was smaller than would be expected based on the current sample size (about 80,000 
persons). As a result, to some extent, the reliability of the pooled data was less and Relative 
Standard Errors (RSEs) higher than could be expected from future pooling based on the full 
LFS sample. As such, potential gains from pooling may be slightly underestimated in this 
report. 


HOW DID DATA POOLING AFFECT THE LEVEL OF PERFORMANCE MEASURES AND 
THEIR RELIABILITY? 


For the 2010 reference point, the estimated levels of the indicators examined were broadly 
similar for single-year and pooled estimates, provided allowance is made for the lag in the 
pooled data (Table A). The largest difference in the indicator, and underlying estimate, 
occurred for NASWD 2 (low qualifications, 20-64 years) where the indicator based on four- 
year pooled data was 2.2 percentage points or proportionally 5% higher than the single-year 
indicator (47.6% compared with 45.4%) (Table A below). The population base for this 
indicator is all people aged 20-64 years and so structural changes in the population and 
changes in labour force participation would likely have influenced the pooled data. For the 
other indicators the three measures, based on five to seven year age ranges only, had very 
similar values, with a proportional difference of under 2% between single-year and pooled 
data. 


When compared with data taken from a single year, RSEs declined moderately for two-year 
pooled data and approximately halved for four year pooled data. This is in line with the 
theoretical calculation that it requires a quadrupling of the sample to halve the standard 
error. Reflecting relative sample sizes, the smallest RSEs were associated with NASWD 2 
(low qualifications, 20-64 years) and the highest with NEA 9 (engagement, 15-19 years). 


TABLE A. LEVEL OF INDICATORS AND RSEs, single-year and pooled data, 2010 


Single- 2-year 4-year 

year pooled pooled 

(2010) (2009-10) (2007-10) 
| 
Indicator ‘000 % RSE of % ‘000 % RSE of % ‘000 % RSE of % 
a 
NEA7 1,351.3 85.6 0.8 1,343.7 85.1 0.6 1,332.1 84.4 0.5 
NEA 9 506.3 69.8 2.1 500.9 69.2 1.4 510.6 70.4 1.0 
NEA 10 = =1,592.0 72.6 10 1,599.1 73.0 0.9 1,613.1 73.6 0.7 
NASWD 2 5,918.1 45.4 0.6 6,031.3 46.3 0.6 6,209.3 47.6 0.4 


DID DATA POOLING ENABLE BETTER DETECTION OF CHANGE OVER TIME? 


This section investigates the detection of change in the selected indicators over time, 
comparing the statistical significance of results using standard single-year data with two- 
year pooled data. In theory, data pooling should lead to better detection of change over time. 
In practice, however, data pooling does not necessarily result in more frequent detection of 
change. Change may have occurred, but the change may not be of a sufficient magnitude to 
be detected, even using pooled data. Alternatively, change may have occurred, but the 
magnitude of change may be large enough to be detected by both standard and pooled 
data. Finally, change may not have occurred, and pooled data should correctly detect no 
change. 


Based on single-year data, there were virtually no detectable changes at the 5% level of 
statistical significance at the national or state/territory level in any of the three National 
Education Agreement (NEA) indicators over the period 2009 to 2010 (Table B below). While 
there were some statistically significant results for change within sub-categories of the 
indicators, such as Socio-Economic Index for Area (SEIFA) quintile, these may have been 
due to random chance rather than actual change. As already noted, even without further 
disaggregation by highest year of school completed or SEIFA quintile, these indicators refer 
to sub-populations of young people determined by narrow age ranges and asa 
consequence are based on small samples (Datacube Table 10). 


Over the previous year 2008 to 2009, by contrast, the two engagement measures NEA 9 
(engagement, 15-19 years) and NEA 10 (engagement, 18-24 years) showed statistically 
significant change at the national level and in Victoria (both measures) and Queensland 
(NEA 9 only) (Table B below). At the national level, the fall in full engagement in employment 
and education/training was associated with a fall in employment (Datacube Table 1c). 


Even when change in NEA 7 was assessed over the three-year period, 2007 to 2010 (using 
single-year data), there were statistically significant rises in Year 12/Certificate Il or above 
attainment for 20-24 year olds at the national level and for NSW only. No change was 
detectable in the other jurisdictions. 


A small number of statistically significant results were observed for change in NEA 
indicators in two-year pooled data, but this was over a two-year period 2008 to 2010 (Table 
C below). At the national level, falls in NEA 9 (engagement, 15-19 years) and NEA 10 
(engagement, 18-24 years) were statistically significant, both for the proportion of people 
fully engaged and for full-time employment (Datacube Table 1c). For NEA 10, full-time 
education/training showed a statistically significant rise, but this was not sufficient to offset 
the decline in employment (Table 1c). At the state/territory level, a statistically significant 
change was observed in Victoria for NEA 9 and NEA 10, and in Queensland for NEA 9. 


Whereas the pooled data was assessed over the two-year period 2008 to 2010, the single- 
year data was assessed over the one-year period 2008 to 2009 or 2009 to 2010. The 
greatest annual change in the engagement indicators NEA 9 and NEA 10 occurred between 
2008 and 2009 during the economic downturn of the Global Financial Crisis. There was 
comparatively little change at the national level in engagement between 2009 and 2010. 
Thus the comparison between single-year and pooled data is affected by the time period 
over which the comparison is made. Indeed, had data been pooled over 2008-09, the fall in 
engagement may have been obscured. If datasets were available, pooling over one year 
would help overcome these issues. 


Looking again at single year data over 2009 to 2010, the indicator from the National 
Agreement on Skills and Workforce Development, NASWD 2 (low qualifications, 20-64 
years), based on a much larger age group and subject to greater annual change, showed a 
significant fall at the national level and for Victoria (Table B below). There was also a 
significant fall at the national level for two age groups: 35-44 years and 45-54 years 
(Datacube Table 1c). Similarly, there was a statistically significant fall in some age groups for 
Victoria (Datacube Table 3c). Over the previous year, 2008 to 2009, there was a detectable 
fall in this indicator at the national level but not in any of the states/territories. By 
comparison, based on pooled data, all jurisdictions except Tasmania and the ACT recorded 
statistically significant falls in the proportion of the 20-64 year old population with 
qualifications below Certificate II between 2008 and 2010 (Table C below). 


At first glance, these findings might appear to endorse the benefits of data pooling in the 
case of NASWD 2. Certainly, since this indicator steadily declines, data pooling allows the 
trend over a two-year period to be detected as significant in more cases than when using 
single-year data over one year. However, this begs the question of whether or not the trend 
would also have been detected as significant if single-year data were compared over the 
same two-year period. This issue is examined immediately below. 


TABLE B. STATISTICAL SIGNIFICANCE OF CHANGE (a), single-year data, 2008 to 
2009 and 2009 to 2010 


NEA 7 NEA 9 NEA 10 NASWD 2 


2008 to 2009to 2010 2008to 2009to 2010 2008 to 2009 to 2008 to 2009 to 


2009 2009 2009 2010 2009 2010 

NSW no no no no no no no no 
Vic. no no yes no yes no no yes 
Qld no no yes no no no no no 
SA no no no no no no no no 
WA no no no no no no no no 
Tas. no no no no no no no no 
NT no no no no no no no no 
ACT no no no no no no no no 


Australia no no yes no yes no yes yes 


a. Based on 2-tail test at 5% level of significance (Z=1.96) 


When single-year data were compared with two-year pooled data over the same two-year 
period 2008 to 2010, results were very similar. At the national level, there were no 
differences in the findings of significant change between single-year and pooled data over 
2008 to 2010 for any of the four indicators (Table C below). Change in NEA 7 (Year 
12/Certificate Il, 20-24 years) was not significant for either single-year or pooled data while 
change in the other indicators was significant in both. Similarly, at the state and territory 
level there were minor variations only between the significance of comparisons using single- 
year or pooled data. In addition to significant change in the indicators NEA 9 and NEA 10 at 
the national level, change was detected in only a small number of jurisdictions. Only in 
NASWD 2 was change detected both at the national level and in a majority of jurisdictions. 


Data pooling led to similar improvements in the reliability of estimates but marginal gains 
only in the detection of change when indicators were disaggregated by population 
characteristics, such as age, SEIFA quintile or highest year of schooling (See following 
sections). 


TABLE C. STATISTICAL SIGNIFICANCE OF CHANGE (a), single-year and pooled data, 
2008 to 2010 


NEA 7 NEA 9 NEA 10 NASWD 2 


Single year Pooled data Single year Pooled Single year Pooled Single year Pooled 


data data data data data data data 
NSW no yes no no no no yes yes 
Vic. no no yes yes no yes yes yes 
Qld no no yes yes no no yes yes 
SA no no yes no no no no yes 
WA no no no no no no yes yes 
Tas. no no no no no no no no 
NT no no no no no no yes yes 
ACT no no no no no no yes no 
Australia no no yes yes yes yes yes yes 


a. Based on 2-tail test at 5% level of significance (Z=1.96) 


FOOTNOTES 


[1] Originally 'The proportion of young people participating in post-school education or 
training six months after school.’ (back to text) 


[2] Full engagement in employment, education or training comprises full-time employment, 
full-time education/training or a mix of part-time employment combined with part-time 
education/training. For NEA 10 education/training must be at Australian Qualification 
Framework (AQF) Certificate III level or above (e.g. a trade qualification, vocational Diploma 
or university degree). (back to text) 


NEA 7: Year 12/Certificate Il (or above) attainment, 20-24 
years 


a! NEA 7: YEAR 12/CERTIFICATE Il (OR ABOVE) ATTAINMENT, 20-24 
years 


Like all three indicators in the National Education Agreement that are sourced from SEW, 
NEA 7 (Year 12/Certificate Il, 20-24 years) is based on a small sub-population of young 
people determined by age. Furthermore, the indicator in most states and territories is slow 
moving in a generally steady upward trend, reflecting the relatively long period already over 
which Australian youth have had good access to and high levels of participation in education 
and training, both at school and in tertiary settings. Single-year data are not sufficiently 
precise to detect annual change at the national or state/territory level over a one or two-year 
period. Nor does doubling the sample through data pooling lead to detection of change over 
a two-year period. Single-year data, however, were sufficiently precise to detect growth at 
the national level and for NSW over the three years 2007 to 2010. 


NATIONAL AND STATE/TERRITORY DATA 


Based on standard single-year estimates, the proportion of the 20 to 24 year old population 
having attained at least a Year 12 or equivalent, or AQF Certificate Level II or above, rose 
from 83.5% in 2007 to 85.6% in 2010 (Datacube Table 1a). 


At the national level, the RSEs for NEA 7 were low and estimates reliable. The RSEs of the 
proportions in the single-year data ranged from 0.7% in 2007 to 1.0% in 2009 (Datacube 
Table 1a). The RSE was highest in 2009 because the SEW sample was reduced in 2009 
before being reinstated in 2010. For the 2010 reference point, the RSEs for the two-year 
pooled data at 0.6%and four-year pooled data at 0.5% were only marginally lower than their 
equivalent in the one-year data of 0.8% (Table D below). Although the national level RSEs 
for NEA 7 were small, irrespective of data pooling, this is a slow moving indicator making it 
difficult to measure change. Using single-year data, no statistically significant change was 
detected at the national level between 2008 and 2009, 2009 and 2010 or the combined 
period 2008 to 2010. Pooled data did not detect change between 2008 and 2010. 


When the single-year data for NEA 7 was examined for the bigger states, New South Wales, 
Victoria, Queensland, South Australia and Western Australia, RSEs ranged between 1.2% 
and 2.9% in 2010. Among the smaller jurisdictions, the RSE was lower in the ACT (2.6%) 
where attainment was high (89.5%) than in Tasmania or the NT where levels of attainment 
were relatively low. Data pooling over two and, particularly, four years led to lower RSEs. In 
Tasmania, the RSE using four-year pooled data was 2.4% compared with 5.2% for single 
year data. For the NT, the gain was more modest, 4.4% for four-year compared with 4.8% 
for single-year data (Table D below). 


Despite improvements in precision, two-year data pooling did not lead to better detection of 
change in the indicator over time for the states and territories. The failure of both the single- 
year and pooled data at the national level to identify any significant change was broadly 
repeated at the state and territory level. The single-year data, over one year 2009 to 2010 
and over two years 2008 to 2010 found no significant change in the six states and two 
territories, while the two-year pooled data over two years (2008 to 2010) found only one-a 
rise in attainment in NSW (Datacube Table 2c). 


When change in NEA 7 was assessed using standard single-year estimates over the three- 
year period, 2007 to 2010, there were statistically significant rises in Year 12/Certificate || or 
above attainment for 20-24 year olds at the national level and for NSW only. No change was 


detectable in the other jurisdictions. Although there was a 3.6 percentage point difference in 
the ACT, from 93.1% in 2007 to 89.5%, this apparent fall was not statistically significant 
(based on Datacube Tables 1a, 2a, 3a, 4a, 5a, 6a, 7a, 8a, 9a). 


TABLE D. YEAR 12/CERTIFICATE Il, 20-24 years, single-year and pooled data, by 
state/territory, 2010 (NEA 7) 


Single- 2-year 4-year 

year pooled pooled 

(2010) (2009-10) (2007-10) 
‘000 % RSE of % ‘000 % RSE of % ‘000 % RSE of % 
NSW 429.3 86.0 1.4 429.9 86.1 1.0 421.8 84.5 0.8 
Vic. 361.5 88.1 1.2 359.2 87.6 1.2 358.3 87.4 0.8 
Qld 275.0 87.9 1.4 271.6 86.8 1.2 268.4 85.8 1.0 
SA 91.2 80.2 2.6 90.6 79.7 1.5 90.7 79.8 1.3 
WA 131.8 79.5 2.9 130.3 78.6 2.3 131.4 79.2 1.6 
Tas. 24.2 77.1 5.2 23.2 73.9 3.7 23.1 73.6 2.4 
NT 12.1 73.1 4.8 11.8 71.3 4.7 11.4 68.8 4.4 
ACT 26.2 89.5 2.6 27.1 92.5 1.8 26.9 92.1 1.5 
Australia 1351.3 85.6 0.8 1343.7 85.1 0.6 1332.1 84.4 0.5 


SOCIOECONOMIC STATUS 


At the national level, the RSEs for individual SEIFA quintiles based on single-year data 
ranged from below 1% for the higher quintiles to over 3% for the low quintiles reflecting the 
smaller proportion of people with Year 12 or equivalent attainment living in areas 
characterised by low socioeconomic status [1] . Higher RSEs in low quintiles were reflected 
in greater fluctuation in the estimates and proportions. Again, the lower RSEs in the pooled 
data did not lead to better detection of change. For individual quintiles, the only apparently 
significant findings of change were for Quintile 3 and Quintile 5 for 2009 to 2010 using single 
year data (Datacube Table 1c). Given the large magnitude of these changes in contrast to 
the small change at the national level overall and the fall in attainment recorded in Quintile 5 
(the least socioeconomically disadvantaged category), these may be random results due to 
sampling variability. 


While there were some statistically significant changes in NEA 7 by quintile at the 
state/territory level over the two-year period 2008 to 2010, these were few in number and, 
given the small sample base, may be due to sampling variability rather than actual change. 


See section on significance testing for further explanation of sampling variability and 
measures of change. 

FOOTNOTE 

[1] SEIFA Index of Relative Socioeconomic Disadvantage. For further information on this 


and other SEIFA Indexes, see Information Paper: An Introduction to Socio-Economic 
Indexes for Areas (SEIFA), 2006 (cat. no. 2039.0) (back to text) 


NEA 9: Engagement, 15-19 years 


3 NEA 9: ENGAGEMENT,15-19 years 


This performance measure is associated with the smallest underlying population of all the 
indicators examined in this report as it looks at only those young people aged 15-19 years 
who have left school. The main indicator measures full engagement in employment or 
education, that is the total who are either in full-time employment, full-time education/training 
or a mix of part-time employment combined with part-time education/training. These sub- 
components are also of interest as are engagement by highest year of school completed 
and socioeconomic status. 


In contrast to the slow moving and steadily increasing attainment measure NEA 7 (Year 
12/Certificate Il, 20-24 years), NEA 9 can be subject to relatively large fluctuations as a 
result of the economic cycle. In the time horizon of this study, change in the level of full-time 
employment associated with the Global Financial Crisis (GFC) appeared to be more critical 
in determining whether or not change was detectable in this indicator than was data pooling. 
Indeed, potential limitations of data pooling over successive survey cycles to measure 
engagement are that it could obscure the time at which change actually occurred or even 
smooth out change. 


Nevertheless, were data pooling to double the sample possible over a shorter time frame of 
say one year, this would lead to some improvement in the accuracy of point in time 
estimates. Change at the scale of that observed during the GFC could also be detected at 
the national level and in the larger states. 


NATIONAL AND STATE/TERRITORY DATA 


At the national level, based on standard single-year estimates, the proportion of young 
people, 15-19 years participating in post school employment, education or training (NEA 9) 
was 73.7% in 2007 and 69.8% in 2010 (Datacube Table 1a). 


The denominator population for NEA 9 is relatively small as it only includes people aged 
15-19 years who have left school. The consequence is that RSEs are higher than for the 
other measures. At the national level for the total fully engaged population, RSEs in 2010 
ranged from 2.1% for single-year data to 1.4% for two-year pooled data and 1.0% for four- 
year pooled data (Table E below). 


Based on standard single-year estimates, the proportion of young people aged 15-19 years 
who had left school and were fully engaged declined from 74.3% in 2008 to 68.4% in 2009, 
with a similar level of 69.8% observed in 2010 (Datacube Table 1a). The change over one 
year, 2008 to 2009 tested as statistically significant but change from 2009 to 2010 did not. 
Change over the two year period 2008 to 2010 was significant using both single-year and 
two-year pooled data (Datacube Table 1c). 


Compared with NEA 7 (Year 12/Certificate II attainment, 20-24 years) and NASWD 2 (low 
qualifications, 20-64 years), which measure the accumulation of qualifications, NEA 9 
(engagement, 15-19 years) and NEA 10 (engagement, 18-24 years) are more volatile 
measures as they measure an individual’s current circumstances and hence reflect current 
economic conditions. In particular, NEA 9 (and NEA 10) appear to have been affected by the 
2009 GFC. 


The impact of the GFC is particularly clear when the fully engaged category is 
disaggregated into its two main constituent parts, young people in full-time employment and 
young people in full-time education/training. Based on single-year data, full-time 
employment declined from 36.2% in 2008 to 29.7% in 2009 and 28.7% in 2010. In contrast 
the proportion in full-time education was more stable, at 36.9% in 2008, 37.6% in 2009 and 
39.4% in 2010 (Datacube Table 1a). For full-time employment, the changes over the one 
year 2008 to 2009 and two years 2008 to 2010 test as statistically significant (using either 
single-year or pooled data) whereas corresponding changes in full-time education/training 
do not (Datacube Table 1c). 


At the state/territory level, data pooling resulted in lower RSEs, with the big gains occurring 
for four-year pooling where RSEs were reduced by about half (Table E below). 
Nevertheless, RSEs for Tasmania (5.5%) and the NT (6.2%) remained relatively high. 
Between 2008 and 2009, change in single year data was detected in two states: Victoria 
and Queensland, but in none between 2009 and 2010. Over the two-year period 2008 to 
2010, statistically significant falls were observed in three states: Victoria, Queensland and 
South Australia based on single-year data but only two, Victoria and Queensland based on 
pooled data (Datacube Tables 3c 4c 5c). 


TABLE E. DIFFERENCE IN ESTIMATES FOR 2010, change through data pooling (NEA 


9) 

Single- 2-year 4-year 

year pooled pooled 

(2010) (2009-10) (2007-10) 
55 
‘000 % RSE of % ‘000 % RSE of % ‘000 % RSE of % 
NSW 166.4 73.3 3.7 159.8 70.9 2.9 155.3 69.1 2.0 
Vic. 113.7 72.8 3.8 109.3 70.9 3.3 117.8 74.3 2.2 
Qld 112.8 64.9 4.0 113.5 65.0 3.4 116.3 68.5 2.5 
SA 31.2 61.8 6.0 33.2 64.8 4.0 33.5 66.3 2.8 
WA 60.2 69.7 5.6 62.4 72.1 3.7 64.0 73.3 2.1 
Tas. 8.9 63.0 8.9 9.4 63.9 8.2 9.6 62.5 5.5 
NT 5.9 74.3 8.1 5.0 66.5 7.2 4.9 64.7 6.2 
ACT 7.2 76.7 5.6 8.2 83.3 4.2 8.7 82.5 2.9 
Australia 506.3 69.8 2.1 500.9 69.2 1.4 510.6 70.4 1.0 


HIGHEST YEAR OF SCHOOL COMPLETED 


At the national level in 2010, for people aged 15-19 years who had completed Year 12 the 
RSE was 2.1% for full engagement overall, 3.2% for full-time education/training and 4.7% 
for full-time employment. For those who had completed schooling prior to Year 12, RSEs for 
full engagement were over 5% and full-time employment between about 6% and 9%. 
Reflecting the relatively small size of the group, RSEs for full-time education/training among 
15-19 year olds who had left school prior to completing Year 12 were substantially higher at 
about 15% to 19% (Datacube Table 1a). 


The pattern of statistically significant falls in full engagement and full-time employment 
between 2008 and 2009 and between 2008 and 2010 among all 15-19 year olds who had 
left school was also observed for the sub-groups whose highest year of school completed 


was Year 12 or Year 10. Over the two-year period 2008 to 2010 these changes were 
generally detected in both single-year and pooled data. In contrast, people whose highest 
year of schooling was Year 11 were not associated with significant change, but the number 
of people in this group was small (Datacube Tables 1a,1c). 


SOCIOECONOMIC STATUS 


At the national level, young people in the lowest quintile (most disadvantaged) had low 
levels of full engagement (57.0%) compared with those in the highest quintile (least 
disadvantaged) (76.9%)[1]. As a consequence RSEs were highest for the lower quintiles 
and lowest for the higher quintiles (Datacube Table 1a). Testing over 2008 to 2009 and 2008 
to 2010, using both single year and pooled data, did not lead to any clear conclusions about 
change in engagement among 15-19 year olds by socioeconomic status (Datacube Table 
1c). 


FOOTNOTE 


[1] SEIFA Index of Relative Socioeconomic Disadvantage. For further information on this 
and other SEIFA Indexes, see Information Paper: An Introduction to Socio-Economic 
Indexes for Areas (SEIFA), 2006 (cat. no. 2039.0) (back to text) 


NEA 10: Engagement, 18-24 years 


4 NEA 10: ENGAGEMENT, 18-24 years 


The discussion for this indicator is similar to that for the other engagement indicator, NEA 9 
(engagement, 15-19 years). In this case the sample is somewhat larger since it covers 7 
years rather than 5, and includes all people in the age group whereas NEA 9 includes only 
that component of the 15-19 year-old population that has left school. 


The potential benefits of boosting the sample size through data pooling to provide more 
accurate point in time estimates and the capacity to measure change at the national level 
would be greatly enhanced if pooling were to occur over one year rather than two. 


NATIONAL AND STATE/TERRITORY DATA 


Based on standard single-year estimates, the proportion of 18 to 24 year olds fully engaged 
in employment or education or training at or above Certificate III was 75.5% in 2007 and 
72.6% in 2010 (Datacube Table 1a). 


At the national level for the total fully-engaged population aged 18-24 years, the RSE in 
2010 ranged from 1.0% for single-year data to 0.9% for two-year pooled data and 0.7% for 
four-year pooled data (Table F below). As with NEA 9, the change over 2008 to 2009 but not 
2009 to 2010 tested as statistically significant, and the change over the two years 2008 to 
2010 tested as significant using both single year and pooled data (Datacube Table 1c). 


When the fully engaged population was partitioned into its constituent parts (slightly 
differently from NEA 9): young people in full-time education/training only, full-time 
employment only and a mix of education/training and employment (full-time education 
combined with full time employment or part time education combined with part time 
employment), a fall in full-time employment was detected between 2008 and 2009, and 
between 2008 and 2010 using both single-year and pooled data. Moreover, pooled but not 
single-year data detected a significant rise in full-time education/training over 2008 to 2010. 
This reflected both a larger difference in the indicator values and lower RSEs in the pooled 
as compared with the single-year data (See tables 1a, 1b, 1c). 


At the state/territory level, the relatively high single-year RSEs for 2010 of 5.2% for 
Tasmania and 4.1% for the NT, were reduced to 4.1% and 3.6% respectively under two-year 
pooling and to about 3% each under four-year pooling (Table F below). Despite gains in 
accuracy, there was little detection of statistically significant change using pooled data. 
Between 2008 and 2009, Victoria was the only jurisdiction to record a significant fall in full 
engagement among 18-24 year olds. This was reflected in a corresponding fall in full 
engagement between 2008 and 2009 based on pooled but not single-year data. Some 
change was detected in NSW, where full-time employment fell between 2008 and 2009 and 
between 2008 and 2010. Conversely, participation in full-time education/training in NSW 
rose between 2008 and 2010, according to the pooled but not single-year data (Datacube 
Table 2c). 


TABLE F. DIFFERENCE IN ESTIMATES FOR 2010, change through data pooling (NEA 


10) 
Single- 2-year 4-year 
year pooled pooled 
(2010) (2009-10) (2007-10) 
‘000 % RSE of % ‘000 % RSE of % ‘000 % RSE of % 
NSW 493.5 70.8 2.6 501.5 72.0 1.8 504.3 72.5 1.3 
Vic. 424.2 75.6 1.6 412.3 73.8 1.5 420.0 74.8 1.0 
Qld 316.9 72.1 2.7 322.6 73.2 1.9 325.1 73.9 1.4 
SA 109.3 68.8 3.0 111.8 70.3 2.2 111.0 70.0 1.5 
WA 170.8 74.6 2.6 172.5 75.2 1.9 174.1 75.9 1.2 
Tas. 28.2 62.8 5.2 30.0 65.9 4.1 29.7 65.4 3.1 
NT 16.2 71.3 4.1 16.1 70.5 3.6 15.9 70.8 3.0 
ACT 33.0 84.8 2.3 32.3 82.8 2.9 33.1 84.3 2.2 
Australia 1592.0 72.6 1.0 1599.1 73.0 0.9 1613.1 73.6 0.7 


SOCIO-ECONOMIC STATUS 


At the national level between 2008 and 2009 and between 2008 and 2010, statistically 
significant falls in full engagement among 18-24 year olds were detected in two SEIFA 
quintiles: Quintile 3 and Quintile 5 (the least disadvantaged) [1]. These falls were 
consistently recorded in single-year and pooled data. In addition, single-year data detected 
a fall in full engagement between 2008 and 2010 among young people in the most 
disadvantaged areas, Quintile 1 (Datacube Table 1c). 


There were a small number of detectable changes by socioeconomic status in the larger 
states between 2008 and 2010. Data pooling over two years did not provide a clear benefit 
over single-year data. For instance, in Victoria, there was a detectable fall in full 


engagement among 18-24 year olds living in the least disadvantaged areas, Quintile 5, 
between 2008 and 2009 and a detectable rise between 2009 and 2010. This resulted ina 
mixed result over the two-year period 2008 to 2010, whereby no change was recorded 
based on single-year data but a significant fall based on pooled data (Datacube Table 3c). In 
NSW (single-year data) and Queensland (pooled data) there were significant falls between 
2008 and 2010 in full engagement in Quintile 5 (Datacube Tables 2c, 4c). As noted in the 
section on NEA 9 (engagement, 15-19 years), the level of full engagement in employment 
and education/training generally follows a rising gradient from lowest to highest quintile. The 
detection of change in Quintile 5, therefore, is due not only to the magnitude of change but 
also to the smaller RSEs associated with the relatively large population base. The 
considerable falls in full engagement recorded in Tasmania in two quintiles between 2009 
and 2010 are based on a small sample and may be due to sampling variability (Datacube 
Table 7c). See the section on significance testing for further explanation of sampling 
variability and measures of change. 


FOOTNOTE 


[1] SEIFA Index of Relative socioeconomic Disadvantage. For further information on this and 
other SEIFA Indexes, see Information Paper: An Introduction to Socio-Economic Indexes for 
Areas (SEIFA), 2006 (cat. no. 2039.0) (back to text) 


NASWD 2: Low qualifications, 20-64 years 


Soh 


— «.) NASWD 2: LOW QUALIFICATIONS, 20-64 years 


Compared with the other indicators, NASWD 2 (low qualifications, 20-64 years) is based on 
a very large sample. In the 2010 SEW there were around 35,000 people aged 20-64 years 
in the sample of whom about 16,000 had qualifications below Certificate III (Datacube Table 
10). Furthermore, NASWD 2 changed more rapidly than the other measures examined, 
particularly the other attainment indicator, NEA 7 (Year 12/certificate Il, 20-24 years). 
NASWD 7 fell by about five percentage points between 2007 and 2010 whereas NEA 7 rose 
by only two. The continuing decline in the proportion of people with qualifications below 
Certificate III reflects generational change, whereby older age cohorts with, on average, 
relatively low qualification levels are being replaced by younger ones with higher 
qualifications. It may also reflect, in part, access across all ages to continuing education and 
training opportunities. 


Even so, this indicator may benefit most from data pooling to double or even quadruple the 
sample. On the one hand, combining data over successive years led to mixed benefits, 
since change over two years could be detected with a similar degree of frequency using 
either pooled or single year data. On the other hand, data pooling over two years did lead to 
detection of change at the state/territory level for most jurisdictions and to some detection of 
change at the national level by 10-year age group or SEIFA quintile. 


The discussion of this indicator is similar to that for the other attainment indicator, NEA 7 
(Year 12/Certificate Il, 20-24 years) except that the population base of 20-64 year olds is 
much larger and the size and rate of change much greater. Indeed, single-year data for 
NASWD 2 in 2010 was based on a larger sample (denominator) than that for any of the 
other indicators even after four-year pooling (Datacube Table 10). 


NATIONAL AND STATE/TERRITORY DATA 


At the national level, the proportion of 20 to 64 year olds with qualifications below Certificate 
Ill fell from 50.1% in 2007 to 45.4% in 2010 (Datacube Table 1a). 


Unlike the indicators measuring participation in education/training and employment, such as 
NEA 9 (engagement, 15-19 years) and NEA 10 (engagement, 18-24 years), it is not strongly 
affected by the economic cycle. At the national level, RSEs in 2010 ranged from 0.6% for 
single and two-year pooled data to 0.4% for four-year pooled data (Table G below). At the 
national level, the accuracy of this indicator was adequate for measuring annual change 
(falls) in the proportion of people aged 20-64 years with qualifications below Certificate III 
between 2008 and 2009 and between 2009 and 2010. Accordingly, single-year and pooled 
data also recorded statistically significant change over the two years 2008 to 2010 
(Datacube Table 1c) . 


At the state/territory level, RSEs for single-year data in 2010 ranged from 1.4% to 2.1% in 
the five largest states: NSW, Victoria, Queensland, SA and WA, and from 3.0% to 3.8% in 
Tasmania, the NT and the ACT. For the smaller states and territories, RSEs were reduced to 
below 3% under two-year pooling and around 2% under four-year pooling (Table G below). 
Virtually no statistically significant changes in attainment were observed over the one-year 
periods 2008 to 2009 or 2009 to 2010 - the exception was a fall in Victoria between 2009 
and 2010 (Datacube Tables 3c). Data pooling over two years led to greater detection of 
change over the two-year period 2008 to 2010 but these changes were also largely 
detectable using single-year data. Between 2008 and 2010, both single-year and pooled 
data recorded six out of eight jurisdictions with significant falls in the indicator (even though 
the set of jurisdictions differed slightly) (Datacube Tables 2c, 3c, 4c, 5c, 6c, 7c, 8c). 


TABLE G. DIFFERENCES IN ESTIMATES FOR 2010, change through data pooling 


(NASWD 2) 

Single- 2-year 4-year 

year pooled pooled 

(2010) (2009-10) (2007-10) 
a 
‘000 % RSE of % ‘000 % RSE of % ‘000 % RSE of % 
ee 
NSW 1841.8 44.2 1.4 1865.9 44.8 1.1 1923.1 46.2 0.8 
Vic. 1445.5 44.0 1.5 1502.6 45.7 1.2 1540.4 46.9 0.9 
Qld 1244.1 47.5 1.6 1250.3 47.7 1.6 1289.8 49.2 1 
SA 468.9 49.2 2.1 477.2 50.1 1.5 490.6 51.5 0.9 
WA 634.8 46.5 1.9 646.6 47.3 1.4 666.7 48.8 1.1 
Tas. 144.3 50.5 3.0 145.1 50.8 2.4 149.3 52.3 1.6 
NT 64.1 47.0 3.5 65.9 48.2 2.6 69.1 50.6 2.1 
ACT 74.6 33.9 3.8 77.7 35.3 2.8 80.3 36.5 2.1 
Australia 5918.1 45.4 0.6 6031.3 46.3 0.6 6209.3 47.6 0.4 


AGE 


By age, the proportion of 20-64 year olds without at least a Certificate Ill was highest for the 
youngest age group, 20-24 years, many of whom were still studying and had yet to attain a 


qualification. Excluding this age group, the proportion of the population without a Certificate 
Ill or higher qualification generally increased with age reflecting growth in qualifications by 
successive age cohorts over time. At the national level RSEs for single-year data in 2010 
ranged between 1.0% and 1.9% across the 10-year age groups (Datacube Table 1a). 


Data pooling did not lead to an appreciable benefit in measuring change in this indicator by 
age group. At the national level, change by age group was sometimes detectable over one 
year and largely detectable over two years, using either single-year or pooled data 
(Datacube Table 1c). At the state/territory level, virtually no age group comparisons were 
detectable over the one-year periods 2008 to 2009 or 2009 to 2010, with the exception of 
Victoria where three age-groups recorded statistically significant falls in low attainment 
between 2009 and 2010 (Datacube Table 3c). A small number of state/territory comparisons 
were significant over the two-year period, 2008 to 2010, even in the NT and ACT, but 
without a consistent pattern between results for single-year and pooled data (Datacube 
Tables 2c, 3c, 4c, 5c, 6c, 7c, 8c). Some of these results may have been due to sampling 
variability rather than actual change. See the section on significance testing for further 
explanation of sampling variability and measures of change. 


SOCIOECONOMIC STATUS 


Since this indicator measures people with low qualifications, it is relatively high in the lower 
SEIFA quintiles and lower in higher quintiles [1]. In 2010, RSEs for single-year data at the 
national level ranged from 1.3% to 1.9% (Datacube Table 1a). Again, data pooling did not 
lead to appreciable benefits in measuring change in this indicator by socioeconomic status 
over the short-term. No change by SEIFA quintile was detectable between 2008 and 2009 or 
between 2009 and 2010. Over the two-year period 2008 to 2010 some changes were 
statistically significant based on either single-year or pooled data (Datacube Table 1c). 


When pooled data was examined over 2008 to 2010 at the state/territory level, change was 
detectable in one or two quintiles for most states/territories, although given the small sample 
size some of these results may have been due to sampling variability rather than actual 
change. In contrast, there were few detectable changes over the one or two-year periods 
using single-year data (Datacube Tables 2c, 3c, 4c, 5c, 6c, 7c, 8c). 


FOOTNOTE 
[1] SEIFA Index of Relative Socioeconomic Disadvantage. For further information on this 


and other SEIFA Indexes, see Information Paper: An Introduction to Socio-Economic 
Indexes for Areas (SEIFA), 2006 (cat. no. 2039.0) (back to text) 


Conclusion 


XS CONCLUSION 


This paper illustrated that data pooling generally led to better or more reliable estimates of 
the variables of interest. Doubling the sample led to moderate reductions in RSEs for the 
selected COAG performance measures while quadrupling it reduced RSEs by about half. 


Data pooling on this scale, however, may not enable detection of very small changes in 
slow-moving indicators for which very large increases in sample size (particularly at 
jurisdiction or SEIFA-quintile level) are required. The data pooling options considered in this 
paper did not enable detection of the small changes in the selected education 
attainment/participation measures and much larger increases in sample size would be 
required to measure such small changes, if they exist. 


It should be noted that, there is no reason why data pooling should always lead to detection 
of small changes, since in some cases no real underlying change exist between two 
estimates. It is possible that pooling may produce better and more reliable estimates that 
make the estimates for two periods move closer together, compared with single-year 
estimates. As such, no statistically significant difference between the two estimates would 
be detected, which would be the correct conclusion to draw in this case. 


Data pooling resulted in no appreciable gain in the detection of change over a two-year 
period in the attainment indicator NEA 7 - the proportion of the 20-24 year old population 
having attained at least a year 12 or equivalent or AQF Certificate Level Il or above. This 
indicator is very slow moving, reflecting the relatively long period already over which 
Australian youth have had good access to and high levels of participation in education and 
training, both at school and in tertiary settings. Furthermore, this indicator is based ona 
small population which, even when doubled, does not provide a sufficient base from which 
to detect change in the survey estimates over a two-year period. 


The assessment of the potential benefit of data pooling for NEA 9 — the proportion of young 
people aged 15-19 years participating in post school education or training and NEA 10 — the 
proportion of 18 to 24 year olds engaged in full time employment, education or training at or 
above Certificate Ill was affected by the occurrence of the Global Financial Crisis in 2009. 
While data pooling detected change over the two year period 2008 to 2010, so did single- 
year data over the critical year 2008 to 2009. 


Compared with single-year data over a one-year period there were some gains at the 
state/territory level in the measurement of change using data pooled over two years for 
NASWD 2 — the proportion of 20-64 year olds who do not have qualifications at or above a 
Certificate III level. This indicator is steadily declining and has a large population base. The 
continuing decline in this at risk indicator reflects generational change, whereby older age 
cohorts with, on average, relatively low qualification levels are being replaced by younger 
ones with higher qualifications. It may also reflect, in part, access across all ages to 
continuing education and training opportunities. While NASWD 2 has a sufficiently large 
sample and rate of change for data pooling to lead to improvements in accuracy at the 
state/territory level, these gains are offset to the extent that single-year data were also 
sufficient for measuring the change that occurred over a two-year period. 


Nevertheless, the data pooling methodology investigated here may have application in other 
situations, for instance where the annual rate of change of a variable is such that it is just 
below the threshold of statistical significance when based on comparison of single-year 
estimates. Data pooling allows for the sample from one survey to be boosted with data from 
other surveys without the additional cost of collection and with no further provider load on 
the community. The methodology adopted here, based on re-weighting the combined 
sample to the latest survey reference point, resulted in estimates that were broadly 
consistent between single-year and pooled data. Although the pooled data were lagging due 
to changes in population structure and labour force participation over the period in which 
pooling took place, estimates for narrow age ranges of about 5 to 10 years were remarkably 
similar. 


Some of the limitations of this study may be overcome if it were possible to pool data over a 


single calendar year. This would further minimise the lag effect of pooling and allow some 
short term trends, for example in the engagement of youth in work or study to be better 
detected. 


MORE INFORMATION 


For more information, contact Information Consultancy on 1300 135 070 or email 
education.statistics@abs.gov.au 


Methodology 


METHODOLOGY 
THE POTENTIAL BENEFITS OF DATA POOLING 


The ABS Survey of Education and Work (SEW) is conducted in May of each year as a 
supplementary survey of the Monthly Population Survey (MPS) of which the main 
component is the Labour Force Survey (LFS). SEW provides a range of key educational 
participation and attainment measures of people aged 15-74 years, along with information 
on the transition between education and work. Education and work data have been available 
annually since 1979. 


SEW data have been used for many years by the Ministerial Council for Education, Early 
Childhood Development and Youth Affairs (MCEECDYA) and more recently the Council of 
Australian Governments (COAG) as the data source for key performance measures of youth 
participation and attainment in education and training. SEW provides accurate point in time 
estimates at the national level, and reasonably accurate estimates for states and territories. 
People in Aboriginal and Torres Strait Islander communities in very remote areas, however, 
are not included in the survey. This has minimal impact only on estimates for most states 
and territories, with the exception of the Northern Territory where people in these 
communities comprise about 15% of the territory population. 


The estimates at finer levels of disaggregation are subject to a higher degree of sampling 
error. Furthermore, apart from point-in-time or level estimates clients are increasingly 
interested in being able to detect small movements in key performance measures from year 
to year at the national and jurisdictional level and for sub-populations. This is difficult given 
the relatively slow rate of change in some indicators, especially those associated with 
attainment, the narrow age bands of interest and the size of the current sample. Increasing 
the sample size would be extremely expensive and arguably is impractical as the SEW is 
collected as a supplement of the MPS which has an established sampling methodology. A 
potential solution is to pool or combine SEW data over time and/or pool it with other existing 
surveys that collect education data to increase effective sample size and thereby reduce the 
sampling error of the educational performance measures. 


SEW provides data for four key national reporting measures of participation and attainment. 
These are: 


= NEA (National Education Agreement) Indicator 7. The proportion of the 20 to 24 year 


old population having attained at least a year 12 or equivalent or AQF Certificate Level 
Il or above 

= NEA Indicator 9. Proportion of young people (15-19 years) participating in post school 
employment, education or training 

= NEA Indicator 10. The proportion of 18 to 24 year olds engaged in full time 
employment, education or training at or above Certificate III 

» NASWD (National Agreement on Skills and Workforce Development) Indicator 2. 
Proportion of 20—64 year olds who do not have qualifications at or above a Certificate 
HI. 


DIFFERENT OPTIONS FOR DATA POOLING 


The potential advantage of pooling is to increase the sample size and reduce the sampling 
error of the measures or variables of interest without having to undertake additional and 
costly sampling. Pooling generally assumes that each survey is an independent non- 
overlapping sample of the same population and that the characteristics of the population 
and the variables of interest do not change substantially from one survey to another. The 
ABS Analytical Services Section conducted an initial assessment on five different options for 
survey data to be used in data pooling: 


= Option A: Combining current SEW with one or more previous SEWs 

# Option B: Combining SEW with another MPS Supplementary Survey 

= Option C: Combining SEW with a Special Social Survey 

Option D: Combining SEW with the Multi-Purpose Household Survey (MPHS) 
Option E: Combining SEW with another SEW, MPS Supplementary Survey and 
MPHS. 


Option A - Combining current SEW with one or more previous SEWs 
Advantages 


The advantage of Option A is its simplicity. It combines successive surveys that have a 
similar survey design, scope, coverage, sampling unit, reporting method, mode of survey, 
reference period and weighting method. All the performance measures of education are 
included in these surveys, and since 2001, similar wording has been used for the education 
questions. Given that the education variables are conceptually the same and similarly 
measured there is little need to harmonise or adjust the measures before pooling. Deriving 
combined or pooled estimates is simpler as composite or adjusted survey weights for pooled 
data can more easily be computed. 


Disadvantages 

By pooling successive cycles of the same survey we lose the independence of the annual 
time series for the data unlike the situation where several concurrent surveys are pooled 
together. It becomes more difficult to interpret the estimate obtained by combining or pooling 
several consecutive SEWs as estimates obtained in this way are no longer a point estimate 
but are a smoothed measure of the combined period. 

Option B - Combining SEW with another MPS Supplementary Survey 


Advantages 


Option B allows independent pooled estimates to be provided over a shorter period, for 


example, one year. Given that the surveys involved are supplements to the MPS they are all 
based on similar methodology and survey design. 


Disadvantages 


The increase in sample size is modest as given the nature of the MPS there is considerable 
overlap in the sample. The MPS sample is comprised of eight staggered groups which each 
spend eight successive months in the survey. Each month, one of the groups moves out of 
the sample and a new group moves in. Each monthly sample, therefore, retains seven out of 
eight of the respondents interviewed in the previous month. Furthermore, in Option B there 
are differences in opportunities for engagement in the reference period that must be taken 
into account with surveys conducted in different months in the year. The SEW for example is 
conducted in May, when most students will be enrolled or in work, while a Survey conducted 
early in the year may find students between study and work. Moreover, the gains in sample 
size may differ by age group due to differences in the scope of the surveys with some of the 
surveys having different sampling distributions. For example, the Job Search Experience 
Survey (JSE) only interviews persons who are unemployed or employed and started their 
current job in the previous 12 months. All these issues mean that this approach requires 
complex re-weighting. 


Option C - Combining SEW with a Special Social Survey (SSS) 
Advantages 


Option C, combining SEW with an SSS, such as the National Health Survey or the Survey of 
Disability, Ageing and Carers, would increase the effective sample and reduce RSEs. 


Disadvantages 


The reductions in RSEs are small while variation in scope, reporting method, response 
rates, frequency and timing of these surveys require substantial adjustment or 
harmonisation of data prior to pooling. There is also substantial risk of introducing additional 
non-sampling error in the combined estimates under this option: e.g. differences in 
scope/coverage between surveys and the way response variables are defined/measured 
etc. Furthermore, with these surveys conducted every 3-6 years, pooled estimates would 
only be provided several years apart. 


Option D - Combining SEW with the Multi-Purpose Household Survey (MPHS) 
Advantages 

Option D involves combining SEW with the MPHS, itself a supplement of the monthly MPS. 
As the MPHS and SEW are both conducted as supplements of the MPS, they have the 
same survey design, sampling unit, coverage, mode and weighting method. This option 
allows the annual time series to be maintained for pooled data. 

Disadvantages 

The additional sample made possible by the inclusion of the MPHS is small as it is only 
conducted on one-third of the out-going MPS rotation group each month and there is 
overlap between the MPHS sample enumerated from June to December and the May SEW. 


Option E - Combining SEW with another SEW, MPS Supplementary Survey and MPHS 


Advantages 


Estimates from the May SEW would be pooled with an additional SEW conducted in 
October, and with all MPS Supplementary Surveys and the MPHS conducted in the current 
year. All these surveys are supplements of the monthly MPS and hence consistent in terms 
of survey design, sampling unit, reporting method, mode of survey, questionnaire wording 
and weighting method. Combining the surveys doubles the sample size and has a similar 
impact in reducing RSEs to Option A. Option E’s advantage over Option A is that it 
maintains the annual time series in the education data. 


Disadvantages 


Option E does no more than double the sample because the October SEW would be carried 
out only five months subsequent to the May SEW with a consequent overlap of the sample 
reducing the effective gain in sample size. Option E's main disadvantage is in terms of 
increased complexity of combining data from different sources with varying scope and 
enumeration periods. An additional SEW would involve substantial costs, which would have 
to be user-funded. 


ABS RECOMMENDED APPROACH FOR SEW DATA POOLING 


After assessing these options, the ABS concluded that only Options A and E provided a 
sufficient increase to the effective sample and consequent reduction in RSEs to be worth 
pursuing for data pooling. Option A, combining the most recent SEW with one or more 
previous SEWs, was preferred for its overall advantages in increasing the effective sample 
size and relative simplicity. These were seen as outweighing the disadvantage that annual 
estimates would not be produced. Moreover there was only limited support from the 
stakeholders for a user-funded SEW. 


ALTERNATIVE METHODS FOR COMBINING/POOLING CONSECUTIVE SURVEYS 


Estimates for a variable of interest can be prepared either by using an aggregate 
‘composite’ or a unit level ‘pooled’ approach. Composite estimates are produced as a 
weighted average of the individual year aggregate estimates of interest, while pooled 
estimates are derived by merging unit level data from the relevant surveys into a single 
dataset, adjusting the individual unit weights in the merged dataset and producing estimates 
using these new weights. The composite approach is used where there are significant 
differences in population characteristics or the variable of interest. The pooled approach is 
generally used where the population does not change significantly from one year to the next, 
where the survey samples are independent of each other and where they measure more or 
less the same variables. 


Composite methods are simpler requiring, in theory, less computation and time to produce 
estimates, while methods based on the pooled approach are slightly more computationally 
intensive, requiring larger amounts of information and time to produce the estimates. The 
methods were compared in terms of information and time required and most importantly in 
terms of accuracy. For reasons of efficiency and theory the primary methods considered 
were: 


=» The aggregate composite minimum variance estimator takes a weighted average of 
the estimators from the surveys to be combined with a weighting factor based on 
minimising the combined variance. This method ensures that the survey with the lower 
variance is given a higher weight in the combined estimate relative to other surveys to 
be combined. 


= Pooled weights calibrated to end period population/labour benchmarks using the 
Generalised Regression Weighting (GREG) method. The GREG method is the 
standard survey weighting procedure used by the ABS for population surveys. This 
method applied to SEW involves calibrating the initial weights in the pooled dataset to 
post-stratified external population and labour force benchmark totals of the end survey 
period using a GREG SAS macro to derive the final weights (and associated replicate 
weights). These weights are then used to produce the estimates of interest and their 
associated RSEs. Calibrating the weights of each survey in the combined dataset to 
demographic/labour force benchmark totals of the most recent survey makes this 
survey the reference period. The pooled (and composite) estimates produced with 
respect to the time point of the most recent survey data used can be described as 
lagging estimates, since sample from earlier surveys combined with sample from the 
most recent time point is employed as the total sample. The degree of lag in the 
estimates depends on the long term trend behaviour of the population characteristic 
being estimated. This is the most comprehensive method of weighting among the 
methods considered but it is computationally more complex and, in theory but not 
necessarily in practice, requires more time and resources to be implemented. (See 
also section on the GREG Method below). 


Testing the alternative methods 


A key assumption in data pooling is that the surveys to be pooled are from the same 
population and that the characteristics of the population and the variables of interest have 
not changed substantially from one survey to another. This assumption was tested by 
examining distributions across the surveys for selected variables, testing for the significance 
of difference in key education measures across the surveys and tests for survey effect for 
the variables being measured. Over the period 2006 to 2009, education-related outcomes 
for the major socioeconomic and demographic groups were broadly stable. The major 
concern was that a MPS sample size decline in 2009 led to a slightly larger error for pooled 
estimates compared to the case if no reductions had occurred. This should be a passing 
problem as the full MPS sample was reinstated for the 2010 SEW. 


A test for survey effect was also conducted to see if the estimates in the pooled data set 
were affected by the period or the survey in which they were collected. The model results 
showed no consistent pattern and the survey effects that were found, although significant, 
were small. Overall the surveys were found comparable and pooling feasible. 


In preliminary testing with three trial measures both methods produced moderately better 
estimates (in terms of reduced RSEs) than single year estimates. The simpler composite 
method generally produced lower RSEs for two measures but higher ones for the remaining 
one than the pooled method. Nevertheless, the RSEs for the two methods were strongly 
correlated and the differences between them were on average small. 


In choosing the optimal method it is important to choose one based not just on the most 
efficient results but on wider theoretical and practical considerations. Ideally the method 
chosen should produce reliable and improved estimates without requiring too much 
information, computation, time and effort. Each method has its own advantages and 
disadvantages. 


The composite method appears simple to use and in testing of three trial measures 
produced on average marginally lower RSEs than the unit level GREG estimator method. 
The composite method just requires data on the estimates and their associated RSEs. A 
problem with this method is that the weighting factor has to be calculated separately for 
each variable included in the analysis and it can be time consuming to implement if many 


time periods are involved and if more than two surveys are being combined. 


Furthermore the estimates of the weighting factor under this particular method vary over 
time, between measures and between the levels at which the measures are produced. This 
means that estimates are not being produced on a consistent basis and as such 
comparisons between the values across national, jurisdictional and Socio-Economic Indexes 
for Areas (SEIFA) quintile level may not be valid or appropriate. This is of particular concern 
given this study's objective of providing more accurate measures, especially of trends over 
time, of key indicators of participation and attainment. 


The GREG method is a more efficient method of combining population samples across 
surveys. It automatically handles the multiple benchmark classes used by the ABS and 
pooled data can be weighted consistently with the post-stratified survey weighting procedure 
used by the ABS for all the independent survey samples. Unlike the aggregate composite 
estimator method, the GREG estimator method requires only one set of weights for the 
different pooling periods, so comparisons across measures are seamlessly coherent. Since 
the weights under this method are calibrated to the benchmark totals of the end-period 
survey, the reference point for the estimates is the most recent survey. This is in contrast to 
the composite method where the reference period is difficult to determine or interpret. The 
way the sample weights are benchmarked under the GREG method also helps to capture 
the variation in data better, particularly at the jurisdictional level and below, since the 
calibration of the weight is done at a final level through the use of population benchmarks at 
jurisdictional and sub-jurisdictional level. 


The main drawback of the GREG method is that more information, data and computing are 
required to derive the weights for the pooled dataset with implications for time, effort and 
resources. At present initial weights have to be derived outside of the final SEW data file 
and two sets of benchmark data (demographic and labour force) are required before this 
information can be fed through the GREG SAS macro to derive the pooled weights. 


However, in reality the task of obtaining the benchmarked data is relatively simple. Since the 
final survey weights in each SEW file are already calibrated to the relevant 
demographic/labour force data for that year the required benchmark data for the pooled 
dataset can simply be obtained from the relevant end-period SEW itself. If the initial weights 
and their corresponding replicate weights are included along with the final weights and their 
corresponding replicate weights, which are already in the final SEW data files, no outside 
data will be needed for analysis. 


While the GREG method may appear more computationally intensive, requiring more data 
and information, since the data required are already included in the SEW datasets, then the 
procedure for weighting and producing the estimates under this method becomes routine. 
Given its overall analytical and practical advantages, the GREG estimator method has been 
adopted as the preferred approach to data pooling. 


The lagged GREG estimator method 


The lagged GREG estimator method obtains the final survey weights through the calibration 
of the initial survey weights (which are based on the initial probability of selection) to a set of 
external population/labour force benchmarks using the ABS GREG SAS macro. With SEW 
data this involves merging the surveys to be pooled into one dataset and then calibrating in 
the initial weights in the combined data set to the set of external population/labour force 
benchmarks for the end reference date. For example if SEW 2009 and SEW 2010 are to be 
pooled then the two datasets are merged together into one dataset and their initial weights 
then calibrated to the population/labour force benchmark totals of SEW 2010. 

For the SEW the following two benchmarks are used to produce the weights using the 


GREG SAS macro: 


=» Demographic benchmark (State by Part of state by Sex by Age group). The 
categories/ groups for this benchmark are: 8 (State) x 2 (Capital cities/ Rest of state) 
x2 (Sex) x 10 (Age group 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 
50-54,55-64, 65+). 

=» Labour Force benchmark (Labour force status by Age group). The categories for this 
benchmark are: 3 (Labour force status — employed, unemployed, not in the labour 
force) x 13 (Age group — 15, 16, 17, 18, 19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-54, 
55-64, 65+). 


It should be noted that age groups differ for the two benchmarks being finer at younger ages 
and wider at older ages for the labour force than the demographic benchmark. 

In brief the steps involved in calibrating weights in the pooled SEW dataset to 
demographic/labour force benchmarks involves the following steps: 


1. Obtain initial weights, corresponding replicate weights and dwelling cluster identifier 
from the May Labour Force Survey (of which the SEW forms a module) for the 
relevant SEW surveys 

2. Merge the records obtained from the May LFS with the records in the SEW merged file 

3. Obtain Demographic benchmark data (Benchmark 1) for the end of the required 
pooled periods (i.e. 2010 for 2009-10) 

4. Obtain Labour force benchmark data (Benchmark 2) for each of the required pooled 
periods (i.e. 2010 for 2009-10) 

5. Adjust the initial weights and the associated replicate weights in the merged SEW data 
file. This requires multiplying the initial weights by 8/7/(#surveys) (to rate up the 
population total to account for the fact that SEW is only conducted on 7/8ths of the 
LFS sample and dividing this by the number of surveys being pooled together) 

6. Use GREG SAS Macro to calibrate the adjusted initial weights (and the associated 
replicate weights) to the required benchmark totals to derive the final pooled weights 
for each of the required totals to derive the final pooled weights for each of the 
required periods. 

7. Use ABS %Table SAS Macro to derive the variances, SEs and RSEs for the measures 
from the pooled dataset for each of the relevant periods. 


WEIGHTING 


The weights in the pooled data are benchmarked to produce estimates of selected 
population characteristics of the last surveyed year included in each dataset because this is 
the year for which estimates are being made. The method is equivalent to the current SEW 
weighting and estimation method for a single year. However, since the pooled data are two 
combined years of SEW sample, as such the estimates produced are ‘lagging’ with respect 
to estimates that may in principle have been produced from a comparable independent 
survey, of the same sample size of the pooled dataset, conducted in the reference year of 
the benchmarks. An advantage of the individual year estimate is that it applies to the 
particular year concerned; whereas a pooled sample reflects the combined behaviour of the 
population (Subject to socioeconomic, migration factors etc) of the years included and hence 
may be best understood as a lagging estimator for the reference year representing the 
situation somewhere during the time period between the two surveys depending on the 
weighting given to each. 


The denominator for the key measures of participation and attainment, which are estimates 
of proportion, #s based on the total number of people in the age group of interest, for the 


year to which the survey is benchmarked, that is the last year included in the estimates. 
Given that one of the benchmarks is 5-year age groups, the total population of 5-year age 
groups in the 2009-2010 pooled data is identical to those in the 2010 single-year estimates 
— see Table 10. However, the numerator of the estimates of proportion will reflect all the 
years included in the combined data sets. This means the estimate of the population with 
that characteristic will reflect the data of both years and the adjusted weights required to 
minimise RSEs but only the denominator population of 2010. 


The ABS benchmarking method for population surveys, as used here, is a generalised 
regression technique which is a model-assisted method of estimation. The choice of 
benchmark variables for weighting and the model assumption about the independence of 
educational attainment variables on the labour force, age and gender has a small impact on 
the estimates produced. The SEW weights for an individual year are calibrated to align with 
independent estimates of the population by sex, 5-year age group, state or territory of usual 
residence, section of state or territory and labour force status. For pooled data across 
several years, extra model assumptions that the characteristics of interest and benchmark 
variables (eg. labour force proportions) do not depend on the year of collection mean that 
the pooled data is not a simple average of the single year estimates. On occasion this can 
lead to the final pooled result being outside the proportion estimates derived from the single 
year data that make up the pooled data set. For example, the estimate derived from the two- 
year pooled result including 2009 data could be fractionally above or below the estimates 
from the data of the two single years that go into it because of the different overall SEW 
sample size in 2009 compared to adjacent years. 


The data pooling method could also be varied by using a different set of initial weights. For 
example, using the final weights for the individual SEW years as initial weights for the 
pooled data may reduce the non-response differences in sample take between the two 
years. However, it would reduce the benefits of lower RSEs for fine output categories under 
the model assumption that educational attainment changes slowly over two years. It is also 
possible that education questions will be added in the future to other LFS supplements 
where the initial LFS weights are readily available and hence including such initial weights in 
this report may give a better indication of the data performance that may occur for future 
pooled estimates. 


The extra implicit assumptions with pooling data means that more care should be taken with 
computed levels of significance. The estimate of significance used here is based on the 
generally accepted p-value of 0.05, which is an estimate related solely to sampling error. 
There are many other factors in a pooling project, particularly in the weighting, that could 
cause a marginal increase or decrease in the value of an estimator. Most of the factors are 
model assumptions related to non sampling error, e.g. assumptions on change of 
characteristics and non-response across two years, and hence will not directly effect the 
variance estimate. Hence care should be taken when the calculated p-value is near 0.05. 


In summary, the presented pooling results arise from one choice of initial weights and 
population benchmarks. As such that choice and the validity of model assumptions about 
population behaviour and survey response across multiple years will have some effect on 
the estimates produced. The method, however, is identical to the method for SEW weighting 
of individual years. In analysing the results for 2007-2010, some very small impacts on the 
comparability of pooled data and individual year time series data have been observed. The 
general behaviour of the two time series however was consistent with the pooled data 
exhibiting lower RSEs for the time point estimates. 


APPROACH TO PRODUCING POOLED SERIES 


In pooling data, two alternative procedures need to be considered for pooled series: non- 
overlapping pooled estimates; and rolling pooled estimates. Non-overlapping estimates 
compare pooled data from an initial set of surveys with data from a subsequent set of 
surveys. Rolling estimates combine adjacent data that overlaps in the middle. Non- 
overlapping estimates are computationally simpler and the trends produced are easier to 
interpret. However, regular estimates cannot be produced as frequently, for example trends 
using two-year pooled data would require four years of data for two data points, and four- 
year pooled data would require eight years of data. 


The advantage of rolling estimates is that estimates can be produced each year maintaining 
the time series in the data. The disadvantage is that it is difficult to interpret what changes in 
the data are measuring. An estimate of the change between two sets of rolling pooled data 
will include some cases that are in both sets of data. In effect this will reduce the difference 
but more importantly the standard error of the change between the two consecutive rolling 
estimates will require evaluation of the resulting covariance between the two estimates. 
Consequently, while rolling estimates may be appropriate for producing level estimates on 
an annual basis rolling estimates are not appropriate for measuring change. For these 
reasons non-overlapping data have been preferred. 


For further information, see Research Paper: Weighting and Standard Error Estimation for 
ABS Household Surveys (Methodology Advisory Committee), Jul 1999 (cat. no. 
1352.0.55.029) 


Significance testing 


=I 
lc q SIGNIFICANCE TESTING 


In this paper data from the Survey of Education and Work (SEW) are used to determine 
whether or not there have been statistically significant changes over time in key 
performance measures . A particular focus is on the degree to which larger population 
samples made possible by data pooling from successive surveys can reduce sample error 
and thereby enhance the detection of change where it exists. 


For comparing estimates between surveys, or between populations within a survey, it is 
necessary to determine whether differences are ‘real’ differences between the 
corresponding population characteristics or simply the result of sampling variability between 
the survey samples. One way to examine this is to determine whether the difference 
between the estimates is statistically significant. This is done by calculating the standard 
error (SE) of the difference between two estimates (x and y ), and using that to calculate the 
test statistic: 


nea 


~ SE(x—- y) 


If the value of this test statistic is greater than 1.96 then there is good evidence of a 
statistically significant difference between the two population estimates with respect to that 
characteristic. Otherwise, it cannot be stated with confidence that there is a real difference 
between the population estimates. 


The ABS advises care in the interpretation of the results of significance testing, particularly 
for fine level aggregate comparisons. It should be first noted that significance testing only 
provides information on the probability of a specific event (eg. the difference between two 
means) occurring randomly when the null hypothesis is true. It does not test if the null 
hypothesis is actually true. Indeed, it should be expected that there will always be some 
level of difference between any two subpopulation sample estimates because of 
demographic and socio-economic heterogeneity in the samples. Hence if 100 fine level 
aggregate statistical significance tests are conducted (eg. at state by SEIFA decile level) 
using a significance level of (1-alpha=5%) then 5% of the tests are expected by chance to 
show statistically significant differences. The interpretation of such analyses should not be 
arbitrarily conclusive, (ie. that there is a definite difference) but consider the calculated 
p-value or z-score, the confidence interval of the estimates and the heterogeneity of 
Australian population characteristics at CD by SEIFA decile level to draw a balanced 
judgement on the likely level of evidence (weak, good, very good etc) for a real difference in 
two subpopulations versus the possibility of false positive findings. 


For more information, see ABS Research Paper: Socio-Economic Indexes for Individuals 
and Families (Methodology Advisory Committee), Jun 2007 (cat. no. 1352.0.55.086) 


Tables 


To access the Datacube tables referenced in this paper, see the file listed in the Downloads 
tab. 


About this Release 


This publication presents methodology and results, including experimental estimates of 
COAG national performance indicators, of data pooling using results from the Survey of 
Education and Work between 2007 and 2010. 


Explanatory Notes 


Explanatory Notes 


EXPLANATORY NOTES 


1 The statistics in this publication were compiled from data collected in the Survey of 
Education and Work (SEW), conducted throughout Australia in May each year as a 
supplement to the monthly Labour Force Survey (LFS). Respondents to the LFS who were 
in scope of the supplementary survey were asked further questions. 


2 The SEW provides a range of key indicators of educational participation and attainment of 
persons aged 15-74 years, along with data on people's transition between education and 
work. Specifically, the supplementary survey provides information on: people presently 
participating in education; level of highest non-school qualification; level of highest 


educational attainment; characteristics of people's transition between education and work; 
and data on apprentices. 


3 For more detail on how the data for the SEW were collected, together with links to other 
explanatory material such as a glossary and list of abbreviations, see the Explanatory Notes 
of the 2010 Survey of Education and Work (cat. no. 6227.0). 


4 The publication Labour Force, Australia (cat. no. 6202.0) contains information about 
survey design, sample redesign, scope, coverage and population benchmarks relevant to 
the LFS, which also apply to supplementary surveys. It also contains definitions of 
demographic and labour force characteristics, and information about telephone interviewing 
relevant to both the LFS and supplementary surveys. For more details on recent changes to 
the LFS, including the reinstatement of the full LFS sample in 2010, see Information Paper: 
Labour Force Sample Design, Nov 2007 (cat. no. 6269.0). 
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